Hands-on Exercise 10: Calibrating Hedonic Pricing Model for Private High-rise Properties with the GWR Method

Author

Lorielle Malveda

Published

October 2, 2024

Modified

October 15, 2024

1. OVERVIEW

Geographically Weighted Regression (GWR) is a spatial statistical technique that takes non-stationary variables into consideration (e.g., climate; demographic factors; physical environment characteristics) and models the local relationships between these independent variables and an outcome of interest (also known as dependent variable).

In this Hands-On exercise, we are going to build hedonic pricing models using GWR methods.

We will use 2015 RESALE CONDOMINIUM PRICES as our dependent variable.

The independent variables are divided into either structural or locational.

2. THE DATA

Two data sets will be used in this model building exercise, they are:

  • URA Master Plan subzone boundary in shapefile format –> MP14_SUBZONE_WEB_PL

  • condo_resale_2015 in csv format –> condo_resale_2015.csv

3. GETTING STARTED

Let’s install the necessary R packages and launch these into the environment.

  • olsrr: R package for building OLS and performing diagnostics tests

  • GWmodel: R package for calibrating geographical weighted family of models

  • corrplot: R package for multivariate data visualisation and analysis

  • sf: Spatial data handling

  • tidyverse, including readr, ggplot2, and dplyr: Attribute data handling

  • tmap: choropleth mapping

The code chunks below installs and launches these R packages into R environment.

pacman::p_load(olsrr, corrplot, ggpubr, sf, spdep, GWmodel, tmap, tidyverse, gt, gtsummary)

4. A SHORT NOTE ABOUT GWmodel

The GWmodel package provides a collection of localized spatial statistical methods, which are:

  1. GW summary statistics
  2. GW principal components analysis
  3. GW discriminant analysis and various forms of GW regression

Typically, the outputs or parameters of the GWmodel are visualized through mapping, serving as a valuable exploratory tool that can often guide or complement more traditional or advanced statistical analyses.

5. GEOSPATIAL DATA WRANGLING

5.1 Importing Geospatial Data

The code chunk below is used to import MP_SUBZONE_WEB_PL shapefile by using st_read() of sf packages.

mpsz = st_read(dsn = "geospatial", layer = "MP14_SUBZONE_WEB_PL")
Reading layer `MP14_SUBZONE_WEB_PL' from data source 
  `C:\loriellemalveda\ISSS626-GAA\Hands-on_Ex\Hands-on_Ex10\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 323 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21

The report indicates that the R object containing the imported MP14_SUBZONE_WEB_PL shapefile is named mpsz and is classified as a simple feature object with a geometry type of multipolygon. It is also important to highlight that the mpsz object lacks EPSG (spatial reference system) information.

5.2 Updating CRS Information

The code chunk below updates the newly imported mpsz with the correct ESPG code (i.e. 3414)

mpsz_svy21 <- st_transform(mpsz, 3414)

After transforming the projection metadata, let’s verify the projection of the newly transformed mpsz_svy21 by using st_crs() of the sf package.

st_crs(mpsz_svy21)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

EPSG is tagged as 3414 now.

Next, let’sl reveal the extent of mpsz_svy21 by using the st_bbox() function of the sf package.

st_bbox(mpsz_svy21) #view extent
     xmin      ymin      xmax      ymax 
 2667.538 15748.721 56396.440 50256.334 

6. ASPATIAL DATA WRANGLING

6.1 Importing the Aspatial Data

Let’s use the read_csv() function of the readr package to import condo_resale_2015 into R as a tibble data frame called condo_resale.

condo_resale = read_csv("data/Condo_resale_2015.csv")

Next, it is important for us to examine if the data file has been imported correctly.

Let’s use glimpse() to display the data structure.

glimpse(condo_resale)
Rows: 1,436
Columns: 23
$ LATITUDE             <dbl> 1.287145, 1.328698, 1.313727, 1.308563, 1.321437,…
$ LONGITUDE            <dbl> 103.7802, 103.8123, 103.7971, 103.8247, 103.9505,…
$ POSTCODE             <dbl> 118635, 288420, 267833, 258380, 467169, 466472, 3…
$ SELLING_PRICE        <dbl> 3000000, 3880000, 3325000, 4250000, 1400000, 1320…
$ AREA_SQM             <dbl> 309, 290, 248, 127, 145, 139, 218, 141, 165, 168,…
$ AGE                  <dbl> 30, 32, 33, 7, 28, 22, 24, 24, 27, 31, 17, 22, 6,…
$ PROX_CBD             <dbl> 7.941259, 6.609797, 6.898000, 4.038861, 11.783402…
$ PROX_CHILDCARE       <dbl> 0.16597932, 0.28027246, 0.42922669, 0.39473543, 0…
$ PROX_ELDERLYCARE     <dbl> 2.5198118, 1.9333338, 0.5021395, 1.9910316, 1.121…
$ PROX_URA_GROWTH_AREA <dbl> 6.618741, 7.505109, 6.463887, 4.906512, 6.410632,…
$ PROX_HAWKER_MARKET   <dbl> 1.76542207, 0.54507614, 0.37789301, 1.68259969, 0…
$ PROX_KINDERGARTEN    <dbl> 0.05835552, 0.61592412, 0.14120309, 0.38200076, 0…
$ PROX_MRT             <dbl> 0.5607188, 0.6584461, 0.3053433, 0.6910183, 0.528…
$ PROX_PARK            <dbl> 1.1710446, 0.1992269, 0.2779886, 0.9832843, 0.116…
$ PROX_PRIMARY_SCH     <dbl> 1.6340256, 0.9747834, 1.4715016, 1.4546324, 0.709…
$ PROX_TOP_PRIMARY_SCH <dbl> 3.3273195, 0.9747834, 1.4715016, 2.3006394, 0.709…
$ PROX_SHOPPING_MALL   <dbl> 2.2102717, 2.9374279, 1.2256850, 0.3525671, 1.307…
$ PROX_SUPERMARKET     <dbl> 0.9103958, 0.5900617, 0.4135583, 0.4162219, 0.581…
$ PROX_BUS_STOP        <dbl> 0.10336166, 0.28673408, 0.28504777, 0.29872340, 0…
$ NO_Of_UNITS          <dbl> 18, 20, 27, 30, 30, 31, 32, 32, 32, 32, 34, 34, 3…
$ FAMILY_FRIENDLY      <dbl> 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0…
$ FREEHOLD             <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1…
$ LEASEHOLD_99YR       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
head(condo_resale$LONGITUDE) #see the data in XCOORD column
[1] 103.7802 103.8123 103.7971 103.8247 103.9505 103.9386
head(condo_resale$LATITUDE) #see the data in YCOORD column
[1] 1.287145 1.328698 1.313727 1.308563 1.321437 1.314198

Next, summary() of base R is used to display the summary statistics of condo_resale tibble data frame.

summary(condo_resale)
    LATITUDE       LONGITUDE        POSTCODE      SELLING_PRICE     
 Min.   :1.240   Min.   :103.7   Min.   : 18965   Min.   :  540000  
 1st Qu.:1.309   1st Qu.:103.8   1st Qu.:259849   1st Qu.: 1100000  
 Median :1.328   Median :103.8   Median :469298   Median : 1383222  
 Mean   :1.334   Mean   :103.8   Mean   :440439   Mean   : 1751211  
 3rd Qu.:1.357   3rd Qu.:103.9   3rd Qu.:589486   3rd Qu.: 1950000  
 Max.   :1.454   Max.   :104.0   Max.   :828833   Max.   :18000000  
    AREA_SQM          AGE           PROX_CBD       PROX_CHILDCARE    
 Min.   : 34.0   Min.   : 0.00   Min.   : 0.3869   Min.   :0.004927  
 1st Qu.:103.0   1st Qu.: 5.00   1st Qu.: 5.5574   1st Qu.:0.174481  
 Median :121.0   Median :11.00   Median : 9.3567   Median :0.258135  
 Mean   :136.5   Mean   :12.14   Mean   : 9.3254   Mean   :0.326313  
 3rd Qu.:156.0   3rd Qu.:18.00   3rd Qu.:12.6661   3rd Qu.:0.368293  
 Max.   :619.0   Max.   :37.00   Max.   :19.1804   Max.   :3.465726  
 PROX_ELDERLYCARE  PROX_URA_GROWTH_AREA PROX_HAWKER_MARKET PROX_KINDERGARTEN 
 Min.   :0.05451   Min.   :0.2145       Min.   :0.05182    Min.   :0.004927  
 1st Qu.:0.61254   1st Qu.:3.1643       1st Qu.:0.55245    1st Qu.:0.276345  
 Median :0.94179   Median :4.6186       Median :0.90842    Median :0.413385  
 Mean   :1.05351   Mean   :4.5981       Mean   :1.27987    Mean   :0.458903  
 3rd Qu.:1.35122   3rd Qu.:5.7550       3rd Qu.:1.68578    3rd Qu.:0.578474  
 Max.   :3.94916   Max.   :9.1554       Max.   :5.37435    Max.   :2.229045  
    PROX_MRT         PROX_PARK       PROX_PRIMARY_SCH  PROX_TOP_PRIMARY_SCH
 Min.   :0.05278   Min.   :0.02906   Min.   :0.07711   Min.   :0.07711     
 1st Qu.:0.34646   1st Qu.:0.26211   1st Qu.:0.44024   1st Qu.:1.34451     
 Median :0.57430   Median :0.39926   Median :0.63505   Median :1.88213     
 Mean   :0.67316   Mean   :0.49802   Mean   :0.75471   Mean   :2.27347     
 3rd Qu.:0.84844   3rd Qu.:0.65592   3rd Qu.:0.95104   3rd Qu.:2.90954     
 Max.   :3.48037   Max.   :2.16105   Max.   :3.92899   Max.   :6.74819     
 PROX_SHOPPING_MALL PROX_SUPERMARKET PROX_BUS_STOP       NO_Of_UNITS    
 Min.   :0.0000     Min.   :0.0000   Min.   :0.001595   Min.   :  18.0  
 1st Qu.:0.5258     1st Qu.:0.3695   1st Qu.:0.098356   1st Qu.: 188.8  
 Median :0.9357     Median :0.5687   Median :0.151710   Median : 360.0  
 Mean   :1.0455     Mean   :0.6141   Mean   :0.193974   Mean   : 409.2  
 3rd Qu.:1.3994     3rd Qu.:0.7862   3rd Qu.:0.220466   3rd Qu.: 590.0  
 Max.   :3.4774     Max.   :2.2441   Max.   :2.476639   Max.   :1703.0  
 FAMILY_FRIENDLY     FREEHOLD      LEASEHOLD_99YR  
 Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
 1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
 Median :0.0000   Median :0.0000   Median :0.0000  
 Mean   :0.4868   Mean   :0.4227   Mean   :0.4882  
 3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000  
 Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  

6.2 Converting the Aspatial Dataframe Into An sf Object

Currently, the condo_resale tibble data frame is aspatial data. We will convert this to an sf object using the code chunk below.

condo_resale.sf <- st_as_sf(condo_resale,
                            coords = c("LONGITUDE", "LATITUDE"),
                            crs=4326) %>%
  st_transform(crs=3414)

The st_transform() function from the sf package is used to convert the coordinates from the WGS84 coordinate reference system (CRS: 4326) to SVY21 (CRS: 3414). After the transformation, the head() function is applied to display the first few rows of the condo_resale.sf object.

head(condo_resale.sf)
Simple feature collection with 6 features and 21 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 22085.12 ymin: 29951.54 xmax: 41042.56 ymax: 34546.2
Projected CRS: SVY21 / Singapore TM
# A tibble: 6 × 22
  POSTCODE SELLING_PRICE AREA_SQM   AGE PROX_CBD PROX_CHILDCARE PROX_ELDERLYCARE
     <dbl>         <dbl>    <dbl> <dbl>    <dbl>          <dbl>            <dbl>
1   118635       3000000      309    30     7.94          0.166            2.52 
2   288420       3880000      290    32     6.61          0.280            1.93 
3   267833       3325000      248    33     6.90          0.429            0.502
4   258380       4250000      127     7     4.04          0.395            1.99 
5   467169       1400000      145    28    11.8           0.119            1.12 
6   466472       1320000      139    22    10.3           0.125            0.789
# ℹ 15 more variables: PROX_URA_GROWTH_AREA <dbl>, PROX_HAWKER_MARKET <dbl>,
#   PROX_KINDERGARTEN <dbl>, PROX_MRT <dbl>, PROX_PARK <dbl>,
#   PROX_PRIMARY_SCH <dbl>, PROX_TOP_PRIMARY_SCH <dbl>,
#   PROX_SHOPPING_MALL <dbl>, PROX_SUPERMARKET <dbl>, PROX_BUS_STOP <dbl>,
#   NO_Of_UNITS <dbl>, FAMILY_FRIENDLY <dbl>, FREEHOLD <dbl>,
#   LEASEHOLD_99YR <dbl>, geometry <POINT [m]>

The output is in a point feature data frame.

7. EXPLORATORY DATA ANALYSIS

In the section, we will perform EDA using the statistical graphics functions of the ggplot2 package.

7.1 EDA Using Statistical Graphics

We can plot the distribution of SELLING_PRICE by using the appropriate Exploratory Data Analysis (EDA) techniques as shown in the code chunk below.

ggplot(data=condo_resale.sf, aes(x=`SELLING_PRICE`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

The figure above reveals a right skewed distribution. This means that more condominium units were transacted at relative lower prices.

Statistically, a skewed distribution can be normalized by applying a log transformation. The code snippet below demonstrates how to create a new variable, LOG_SELLING_PRICE, by applying a log transformation to the SELLING_PRICE variable. This operation is performed using the mutate() function from the dplyr package.

condo_resale.sf <- condo_resale.sf %>%
  mutate(`LOG_SELLING_PRICE` = log(SELLING_PRICE))

Now, let’s plot the LOG_SELLING_PRICE using the code chunk below.

ggplot(data=condo_resale.sf, aes(x=`LOG_SELLING_PRICE`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

Now, the distribution is relatively less skewed after the transformation.

7.2 Multiple Histogram Plots Distribution of Variables

In this section, let’s create small multiple histograms, which is also known as a trellis plot, using the ggarrange() function from the ggpubr package.

The code chunk below generates 12 histograms and then organizes them into a 3-column by 4-row layout using ggarrange() to form a small multiple plot.

AREA_SQM <- ggplot(data=condo_resale.sf, aes(x= `AREA_SQM`)) + 
  geom_histogram(bins=20, color="black", fill="light blue")

AGE <- ggplot(data=condo_resale.sf, aes(x= `AGE`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

PROX_CBD <- ggplot(data=condo_resale.sf, aes(x= `PROX_CBD`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

PROX_CHILDCARE <- ggplot(data=condo_resale.sf, aes(x= `PROX_CHILDCARE`)) + 
  geom_histogram(bins=20, color="black", fill="light blue")

PROX_ELDERLYCARE <- ggplot(data=condo_resale.sf, aes(x= `PROX_ELDERLYCARE`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

PROX_URA_GROWTH_AREA <- ggplot(data=condo_resale.sf, 
                               aes(x= `PROX_URA_GROWTH_AREA`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

PROX_HAWKER_MARKET <- ggplot(data=condo_resale.sf, aes(x= `PROX_HAWKER_MARKET`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

PROX_KINDERGARTEN <- ggplot(data=condo_resale.sf, aes(x= `PROX_KINDERGARTEN`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

PROX_MRT <- ggplot(data=condo_resale.sf, aes(x= `PROX_MRT`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

PROX_PARK <- ggplot(data=condo_resale.sf, aes(x= `PROX_PARK`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

PROX_PRIMARY_SCH <- ggplot(data=condo_resale.sf, aes(x= `PROX_PRIMARY_SCH`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

PROX_TOP_PRIMARY_SCH <- ggplot(data=condo_resale.sf, 
                               aes(x= `PROX_TOP_PRIMARY_SCH`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

ggarrange(AREA_SQM, AGE, PROX_CBD, PROX_CHILDCARE, PROX_ELDERLYCARE, 
          PROX_URA_GROWTH_AREA, PROX_HAWKER_MARKET, PROX_KINDERGARTEN, PROX_MRT,
          PROX_PARK, PROX_PRIMARY_SCH, PROX_TOP_PRIMARY_SCH,  
          ncol = 3, nrow = 4)

7.3 Drawing the Statistical Point Map

Lastly, let’s reveal the geospatial distribution of condominium resale prices in Singapore. The map will be prepared by using the tmap package.

Let’s use the interactive mode of tmap by using the code chunk below.

tmap_mode("view")

We will then create an interactive point symbol map using the code chunk below.

tmap_options(check.and.fix = TRUE)
tm_shape(mpsz_svy21)+
  tm_polygons() +
tm_shape(condo_resale.sf) +  
  tm_dots(col = "SELLING_PRICE",
          alpha = 0.6,
          style="quantile") +
  tm_view(set.zoom.limits = c(11,14))

It is important to note that tm_dots() is used here instead of tm_bubbles(). Additionally, the set.zoom.limits argument within the tm_view() function is utilized to set the minimum and maximum zoom levels to 11 and 14, respectively.

Before proceeding to the next section, the code snippet provided below will be used to switch R’s display to plot mode.

tmap_mode("plot")

8. HEDONIC PRICING MODELLING IN R

In this section, we are going to build hedonic pricing models for condominium resale units in Singapore using the lm() of R base.

8.1 Simple Linear Regression Method

First, we will build a simple linear regression model by using SELLING_PRICE as the dependent variable and AREA_SQM as the independent variable.

condo.slr <- lm(formula=SELLING_PRICE ~ AREA_SQM, data = condo_resale.sf)

The lm() function returns an object of class “lm”, or for multiple responses, of class c("mlm", "lm"). The functions summary() and anova() can be used to generate and display a summary and an analysis of variance table for the results. Various useful components of the lm output, such as coefficients, effects, fitted values, and residuals, can be extracted using the respective generic accessor functions.

summary(condo.slr)

Call:
lm(formula = SELLING_PRICE ~ AREA_SQM, data = condo_resale.sf)

Residuals:
     Min       1Q   Median       3Q      Max 
-3695815  -391764   -87517   258900 13503875 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -258121.1    63517.2  -4.064 5.09e-05 ***
AREA_SQM      14719.0      428.1  34.381  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 942700 on 1434 degrees of freedom
Multiple R-squared:  0.4518,    Adjusted R-squared:  0.4515 
F-statistic:  1182 on 1 and 1434 DF,  p-value: < 2.2e-16

The output report reveals that the SELLING_PRICE can be explained by using the formula:

 *y = -258121.1 + 14719x1*

The R-squared value of 0.4518 indicates that the simple regression model explains approximately 45% of the variance in resale prices. Since the p-value is significantly smaller than 0.0001, we reject the null hypothesis that the mean is a good estimator of SELLING_PRICE. This suggests that the simple linear regression model is a good predictor of SELLING_PRICE.

In the “Coefficients” section of the report, it is shown that the p-values for both the intercept and the estimate of ARA_SQM are smaller than 0.001. Consequently, we reject the null hypothesis that B0 and B1 are equal to 0, leading us to conclude that B0 and B1 are reliable parameter estimates.

To visualize the best fit line on a scatterplot, we can use the lm() function as a method within ggplot’s geometry, as demonstrated in the code chunk below.

ggplot(data=condo_resale.sf,          
       aes(x=`AREA_SQM`, y=`SELLING_PRICE`)) +   
   geom_point() +   
   geom_smooth(method = lm)

The figure above reveals that there are a few statistical outliers with relatively high selling prices.

8.2 Multiple Linear Regression Method

VISUALIZING THE RELATIONSHIPS OF THE INDEPENDENT VARIABLES

Before constructing a multiple regression model, it’s important to check that the independent variables are not highly correlated with each other. If highly correlated variables are mistakenly included in the model, it can reduce its accuracy, a problem known as multicollinearity in statistics.

A correlation matrix is commonly used to visualize the relationships between independent variables. In addition to R’s pairs() function, many packages offer ways to display a correlation matrix. In this section, we will use the corrplot package.

The code chunk below demonstrates how to create a scatterplot matrix to explore the relationships between the independent variables in the condo_resale data frame.

corrplot(cor(condo_resale[, 5:23]), diag = FALSE, order = "AOE",          
         tl.pos = "td", tl.cex = 0.5, method = "number", type = "upper")

Matrix reordering is very important for uncovering hidden structures and patterns within a matrix. In corrplot, there are four reordering methods (set using the order parameter): “AOE”, “FPC”, “hclust”, and “alphabet”. In the previous code, the AOE order was applied, which arranges variables based on the angular order of eigenvectors, a method suggested by Michael Friendly.

From the scatterplot matrix, it is evident that Freehold is highly correlated with LEASE_99YEAR. Given this, it is more prudent to include only one of these variables in the subsequent model. Therefore, LEASE_99YEAR is excluded from the following model-building process.

8.3 Building a Hedonic Pricing Model Using the Multiple Linear Regression Method

The code chunk below uses lm() to calibrate the multiple linear regression model.

condo.mlr <- lm(formula = SELLING_PRICE ~ AREA_SQM + AGE    
                + PROX_CBD + PROX_CHILDCARE + PROX_ELDERLYCARE +                 PROX_URA_GROWTH_AREA + PROX_HAWKER_MARKET + PROX_KINDERGARTEN +                    PROX_MRT  + PROX_PARK + PROX_PRIMARY_SCH +  PROX_TOP_PRIMARY_SCH + PROX_SHOPPING_MALL + PROX_SUPERMARKET + PROX_BUS_STOP + NO_Of_UNITS + FAMILY_FRIENDLY + FREEHOLD,data=condo_resale.sf) 
summary(condo.mlr)

Call:
lm(formula = SELLING_PRICE ~ AREA_SQM + AGE + PROX_CBD + PROX_CHILDCARE + 
    PROX_ELDERLYCARE + PROX_URA_GROWTH_AREA + PROX_HAWKER_MARKET + 
    PROX_KINDERGARTEN + PROX_MRT + PROX_PARK + PROX_PRIMARY_SCH + 
    PROX_TOP_PRIMARY_SCH + PROX_SHOPPING_MALL + PROX_SUPERMARKET + 
    PROX_BUS_STOP + NO_Of_UNITS + FAMILY_FRIENDLY + FREEHOLD, 
    data = condo_resale.sf)

Residuals:
     Min       1Q   Median       3Q      Max 
-3475964  -293923   -23069   241043 12260381 

Coefficients:
                       Estimate Std. Error t value Pr(>|t|)    
(Intercept)           481728.40  121441.01   3.967 7.65e-05 ***
AREA_SQM               12708.32     369.59  34.385  < 2e-16 ***
AGE                   -24440.82    2763.16  -8.845  < 2e-16 ***
PROX_CBD              -78669.78    6768.97 -11.622  < 2e-16 ***
PROX_CHILDCARE       -351617.91  109467.25  -3.212  0.00135 ** 
PROX_ELDERLYCARE      171029.42   42110.51   4.061 5.14e-05 ***
PROX_URA_GROWTH_AREA   38474.53   12523.57   3.072  0.00217 ** 
PROX_HAWKER_MARKET     23746.10   29299.76   0.810  0.41782    
PROX_KINDERGARTEN     147468.99   82668.87   1.784  0.07466 .  
PROX_MRT             -314599.68   57947.44  -5.429 6.66e-08 ***
PROX_PARK             563280.50   66551.68   8.464  < 2e-16 ***
PROX_PRIMARY_SCH      180186.08   65237.95   2.762  0.00582 ** 
PROX_TOP_PRIMARY_SCH    2280.04   20410.43   0.112  0.91107    
PROX_SHOPPING_MALL   -206604.06   42840.60  -4.823 1.57e-06 ***
PROX_SUPERMARKET      -44991.80   77082.64  -0.584  0.55953    
PROX_BUS_STOP         683121.35  138353.28   4.938 8.85e-07 ***
NO_Of_UNITS             -231.18      89.03  -2.597  0.00951 ** 
FAMILY_FRIENDLY       140340.77   47020.55   2.985  0.00289 ** 
FREEHOLD              359913.01   49220.22   7.312 4.38e-13 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 755800 on 1417 degrees of freedom
Multiple R-squared:  0.6518,    Adjusted R-squared:  0.6474 
F-statistic: 147.4 on 18 and 1417 DF,  p-value: < 2.2e-16

8.4 Preparing Publication Quality Table: the OLSRR Method

With reference to the report above, it is clear that not all the independent variables are statistically significant. We will revise the model by removing those variables that are not statistically significant.

Now, we are ready to calibrate the revised model by using the code chunk below.

condo.mlr1 <- lm(formula = SELLING_PRICE ~ AREA_SQM + AGE +                     PROX_CBD + PROX_CHILDCARE + PROX_ELDERLYCARE +                    PROX_URA_GROWTH_AREA + PROX_MRT  + PROX_PARK +                     
                   PROX_PRIMARY_SCH + PROX_SHOPPING_MALL    + PROX_BUS_STOP +                     NO_Of_UNITS + FAMILY_FRIENDLY + FREEHOLD, data=condo_resale.sf)

ols_regress(condo.mlr1)
                                Model Summary                                 
-----------------------------------------------------------------------------
R                            0.807       RMSE                     751998.679 
R-Squared                    0.651       MSE                571471422208.591 
Adj. R-Squared               0.647       Coef. Var                    43.168 
Pred R-Squared               0.638       AIC                       42966.758 
MAE                     414819.628       SBC                       43051.072 
-----------------------------------------------------------------------------
 RMSE: Root Mean Square Error 
 MSE: Mean Square Error 
 MAE: Mean Absolute Error 
 AIC: Akaike Information Criteria 
 SBC: Schwarz Bayesian Criteria 

                                     ANOVA                                       
--------------------------------------------------------------------------------
                    Sum of                                                      
                   Squares          DF         Mean Square       F         Sig. 
--------------------------------------------------------------------------------
Regression    1.512586e+15          14        1.080418e+14    189.059    0.0000 
Residual      8.120609e+14        1421    571471422208.591                      
Total         2.324647e+15        1435                                          
--------------------------------------------------------------------------------

                                               Parameter Estimates                                                
-----------------------------------------------------------------------------------------------------------------
               model           Beta    Std. Error    Std. Beta       t        Sig           lower          upper 
-----------------------------------------------------------------------------------------------------------------
         (Intercept)     527633.222    108183.223                   4.877    0.000     315417.244     739849.200 
            AREA_SQM      12777.523       367.479        0.584     34.771    0.000      12056.663      13498.382 
                 AGE     -24687.739      2754.845       -0.167     -8.962    0.000     -30091.739     -19283.740 
            PROX_CBD     -77131.323      5763.125       -0.263    -13.384    0.000     -88436.469     -65826.176 
      PROX_CHILDCARE    -318472.751    107959.512       -0.084     -2.950    0.003    -530249.889    -106695.613 
    PROX_ELDERLYCARE     185575.623     39901.864        0.090      4.651    0.000     107302.737     263848.510 
PROX_URA_GROWTH_AREA      39163.254     11754.829        0.060      3.332    0.001      16104.571      62221.936 
            PROX_MRT    -294745.107     56916.367       -0.112     -5.179    0.000    -406394.234    -183095.980 
           PROX_PARK     570504.807     65507.029        0.150      8.709    0.000     442003.938     699005.677 
    PROX_PRIMARY_SCH     159856.136     60234.599        0.062      2.654    0.008      41697.849     278014.424 
  PROX_SHOPPING_MALL    -220947.251     36561.832       -0.115     -6.043    0.000    -292668.213    -149226.288 
       PROX_BUS_STOP     682482.221    134513.243        0.134      5.074    0.000     418616.359     946348.082 
         NO_Of_UNITS       -245.480        87.947       -0.053     -2.791    0.005       -418.000        -72.961 
     FAMILY_FRIENDLY     146307.576     46893.021        0.057      3.120    0.002      54320.593     238294.560 
            FREEHOLD     350599.812     48506.485        0.136      7.228    0.000     255447.802     445751.821 
-----------------------------------------------------------------------------------------------------------------

8.5 Preparing Publication Quality Table: the gtsummary Method

The gtsummary package provides an elegant and flexible way to create publication-ready summary tables in R.

In the code chunk below, tbl_regression() is used to create a well formatted regression report.

tbl_regression(condo.mlr1, intercept = TRUE)
Characteristic Beta 95% CI1 p-value
(Intercept) 527,633 315,417, 739,849 <0.001
AREA_SQM 12,778 12,057, 13,498 <0.001
AGE -24,688 -30,092, -19,284 <0.001
PROX_CBD -77,131 -88,436, -65,826 <0.001
PROX_CHILDCARE -318,473 -530,250, -106,696 0.003
PROX_ELDERLYCARE 185,576 107,303, 263,849 <0.001
PROX_URA_GROWTH_AREA 39,163 16,105, 62,222 <0.001
PROX_MRT -294,745 -406,394, -183,096 <0.001
PROX_PARK 570,505 442,004, 699,006 <0.001
PROX_PRIMARY_SCH 159,856 41,698, 278,014 0.008
PROX_SHOPPING_MALL -220,947 -292,668, -149,226 <0.001
PROX_BUS_STOP 682,482 418,616, 946,348 <0.001
NO_Of_UNITS -245 -418, -73 0.005
FAMILY_FRIENDLY 146,308 54,321, 238,295 0.002
FREEHOLD 350,600 255,448, 445,752 <0.001
1 CI = Confidence Interval

With the gtsummary package, model statistics can be included in the report by either appending them to the report table by using add_glance_table() or adding as a table source note by using add_glance_source_note() as shown in the code chunk below.

tbl_regression(condo.mlr1,                 
               intercept = TRUE) %>%  add_glance_source_note(     
                 label = list(sigma ~ "\U03C3"),     
                 include = c(r.squared, adj.r.squared,                  
                             AIC, statistic,                 
                             p.value, sigma))
Characteristic Beta 95% CI1 p-value
(Intercept) 527,633 315,417, 739,849 <0.001
AREA_SQM 12,778 12,057, 13,498 <0.001
AGE -24,688 -30,092, -19,284 <0.001
PROX_CBD -77,131 -88,436, -65,826 <0.001
PROX_CHILDCARE -318,473 -530,250, -106,696 0.003
PROX_ELDERLYCARE 185,576 107,303, 263,849 <0.001
PROX_URA_GROWTH_AREA 39,163 16,105, 62,222 <0.001
PROX_MRT -294,745 -406,394, -183,096 <0.001
PROX_PARK 570,505 442,004, 699,006 <0.001
PROX_PRIMARY_SCH 159,856 41,698, 278,014 0.008
PROX_SHOPPING_MALL -220,947 -292,668, -149,226 <0.001
PROX_BUS_STOP 682,482 418,616, 946,348 <0.001
NO_Of_UNITS -245 -418, -73 0.005
FAMILY_FRIENDLY 146,308 54,321, 238,295 0.002
FREEHOLD 350,600 255,448, 445,752 <0.001
R² = 0.651; Adjusted R² = 0.647; AIC = 42,967; Statistic = 189; p-value = <0.001; σ = 755,957
1 CI = Confidence Interval

For additional customization options, you can refer to the “Tutorial: tbl_regression” documentation, which provides detailed guidance on how to further modify and tailor the output of regression tables to suit your needs.

CHECKING FOR MULTICOLLINEARITY

In this section, we introduce a fantastic R package specifically designed for performing ordinary least squares (OLS) regression: olsrr. This package offers a collection of highly useful methods for improving multiple linear regression models, including:

  • Comprehensive regression output
  • Residual diagnostics
  • Measures of influence
  • Heteroskedasticity tests
  • Collinearity diagnostics
  • Model fit assessment
  • Variable contribution assessment
  • Variable selection procedures

In the code chunk below, the ols_vif_tol() function from the olsrr package is used to check for signs of multicollinearity in the model.

ols_vif_tol(condo.mlr1)
              Variables Tolerance      VIF
1              AREA_SQM 0.8728554 1.145665
2                   AGE 0.7071275 1.414172
3              PROX_CBD 0.6356147 1.573280
4        PROX_CHILDCARE 0.3066019 3.261559
5      PROX_ELDERLYCARE 0.6598479 1.515501
6  PROX_URA_GROWTH_AREA 0.7510311 1.331503
7              PROX_MRT 0.5236090 1.909822
8             PROX_PARK 0.8279261 1.207837
9      PROX_PRIMARY_SCH 0.4524628 2.210126
10   PROX_SHOPPING_MALL 0.6738795 1.483945
11        PROX_BUS_STOP 0.3514118 2.845664
12          NO_Of_UNITS 0.6901036 1.449058
13      FAMILY_FRIENDLY 0.7244157 1.380423
14             FREEHOLD 0.6931163 1.442759

Since the VIF values of the independent variables are less than 5, we can safely conclude that there is no sign of multicollinearity among the independent variables.

TEST FOR NON-LINEARITY

In multiple linear regression, it is important for us to test the assumption that linearity and additive of the relationship between dependent and independent variables.

Using the ols_plot_resid_fit() of the olsrr package, let’s perform the linearity assumption test.

ols_plot_resid_fit(condo.mlr1)

The figure above shows that most of the data points are scattered around the 0 line, indicating that the relationships between the dependent variable and the independent variables are likely linear. Therefore, we can reasonably conclude that the assumption of linearity holds for this model.

TEST FOR NORMALITY ASSUMPTION

Lastly, the code chunk below uses ols_plot_resid_hist() of the olsrr package to perform the normality assumption test.

ols_plot_resid_hist(condo.mlr1)

The figure reveals that the residual of the multiple linear regression model (i.e. condo.mlr1) resembles a normal distribution.

For more formal statistical test methods, refer to the ols_test_normality() of the olsrr package as shown in the code chunk below.

ols_test_normality(condo.mlr1)
-----------------------------------------------
       Test             Statistic       pvalue  
-----------------------------------------------
Shapiro-Wilk              0.6856         0.0000 
Kolmogorov-Smirnov        0.1366         0.0000 
Cramer-von Mises         121.0768        0.0000 
Anderson-Darling         67.9551         0.0000 
-----------------------------------------------

The summary table above reveals that the p-values of the four tests are way smaller than the alpha value of 0.05. Hence, we will reject the null hypothesis and infer that there is statistical evidence that the residual are not normally distributed.

TESTING FOR SPATIAL AUTOCORRELATION

The hedonic model we are building incorporates geographically referenced attributes, making it important to visualize the residuals of the hedonic pricing model.

To conduct a spatial autocorrelation test, we need to convert the condo_resale.sf object from an sf data frame to a SpatialPointsDataFrame.

The first step is to extract the residuals from the hedonic pricing model and save them as a separate data frame.

mlr.output <- as.data.frame(condo.mlr1$residuals)

Next, let’s join the newly created data frame with condo_resale.sf object.

condo_resale.res.sf <- cbind(condo_resale.sf, 
                        condo.mlr1$residuals) %>%
rename(`MLR_RES` = `condo.mlr1.residuals`)

Next, we will convert condo_resale.res.sf from a simple feature object into a SpatialPointsDataFrame, as the spdep package can only work with spatial data objects in the sp format.

The following code chunk demonstrates the data conversion process.

condo_resale.sp <- as_Spatial(condo_resale.res.sf)
condo_resale.sp
class       : SpatialPointsDataFrame 
features    : 1436 
extent      : 14940.85, 43352.45, 24765.67, 48382.81  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
variables   : 23
names       : POSTCODE, SELLING_PRICE, AREA_SQM, AGE,    PROX_CBD, PROX_CHILDCARE, PROX_ELDERLYCARE, PROX_URA_GROWTH_AREA, PROX_HAWKER_MARKET, PROX_KINDERGARTEN,    PROX_MRT,   PROX_PARK, PROX_PRIMARY_SCH, PROX_TOP_PRIMARY_SCH, PROX_SHOPPING_MALL, ... 
min values  :    18965,        540000,       34,   0, 0.386916393,    0.004927023,      0.054508623,          0.214539508,        0.051817113,       0.004927023, 0.052779424, 0.029064164,      0.077106132,          0.077106132,                  0, ... 
max values  :   828833,       1.8e+07,      619,  37, 19.18042832,     3.46572633,      3.949157205,           9.15540001,        5.374348075,       2.229045366,  3.48037319,  2.16104919,      3.928989144,          6.748192062,        3.477433767, ... 

Next, we will use the tmap package to display the distribution of the residuals on an interactive map.

The code chunk below is used to create an interactive point symbol map.

tmap_mode("view") 
tm_shape(mpsz_svy21)+   
  tmap_options(check.and.fix = TRUE) +   
  tm_polygons(alpha = 0.4) + 
  tm_shape(condo_resale.res.sf) +     
  tm_dots(col = "MLR_RES",           alpha = 0.6,           style="quantile") +   
  tm_view(set.zoom.limits = c(11,14))

Remember to switch back to “plot” mode before continuing!

tmap_mode("plot")

The figure above indicates signs of spatial autocorrelation. To confirm this observation, we will perform the Moran’s I test.

First, we will compute the distance-based weight matrix using the dnearneigh() function from the spdep package.

 nb <- dnearneigh(coordinates(condo_resale.sp), 0, 1500, longlat = FALSE) 
 summary(nb)
Neighbour list object:
Number of regions: 1436 
Number of nonzero links: 66266 
Percentage nonzero weights: 3.213526 
Average number of links: 46.14624 
10 disjoint connected subgraphs
Link number distribution:

  1   3   5   7   9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24 
  3   3   9   4   3  15  10  19  17  45  19   5  14  29  19   6  35  45  18  47 
 25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44 
 16  43  22  26  21  11   9  23  22  13  16  25  21  37  16  18   8  21   4  12 
 45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64 
  8  36  18  14  14  43  11  12   8  13  12  13   4   5   6  12  11  20  29  33 
 65  66  67  68  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84 
 15  20  10  14  15  15  11  16  12  10   8  19  12  14   9   8   4  13  11   6 
 85  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100 101 102 103 104 
  4   9   4   4   4   6   2  16   9   4   5   9   3   9   4   2   1   2   1   1 
105 106 107 108 109 110 112 116 125 
  1   5   9   2   1   3   1   1   1 
3 least connected regions:
193 194 277 with 1 link
1 most connected region:
285 with 125 links

Next, let’s use the nb2listw() of the spdep package to convert the output neighbors lists (i.e. nb) into spatial weights.

nb_lw <- nb2listw(nb, style = 'W') 
summary(nb_lw)
Characteristics of weights list object:
Neighbour list object:
Number of regions: 1436 
Number of nonzero links: 66266 
Percentage nonzero weights: 3.213526 
Average number of links: 46.14624 
10 disjoint connected subgraphs
Link number distribution:

  1   3   5   7   9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24 
  3   3   9   4   3  15  10  19  17  45  19   5  14  29  19   6  35  45  18  47 
 25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44 
 16  43  22  26  21  11   9  23  22  13  16  25  21  37  16  18   8  21   4  12 
 45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64 
  8  36  18  14  14  43  11  12   8  13  12  13   4   5   6  12  11  20  29  33 
 65  66  67  68  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84 
 15  20  10  14  15  15  11  16  12  10   8  19  12  14   9   8   4  13  11   6 
 85  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100 101 102 103 104 
  4   9   4   4   4   6   2  16   9   4   5   9   3   9   4   2   1   2   1   1 
105 106 107 108 109 110 112 116 125 
  1   5   9   2   1   3   1   1   1 
3 least connected regions:
193 194 277 with 1 link
1 most connected region:
285 with 125 links

Weights style: W 
Weights constants summary:
     n      nn   S0       S1       S2
W 1436 2062096 1436 94.81916 5798.341

After this, the lm.morantest() function from the spdep package will be used to perform Moran’s I test to assess residual spatial autocorrelation. This will help us determine whether the spatial distribution of residuals indicates significant spatial autocorrelation.

lm.morantest(condo.mlr1, nb_lw)

    Global Moran I for regression residuals

data:  
model: lm(formula = SELLING_PRICE ~ AREA_SQM + AGE + PROX_CBD +
PROX_CHILDCARE + PROX_ELDERLYCARE + PROX_URA_GROWTH_AREA + PROX_MRT +
PROX_PARK + PROX_PRIMARY_SCH + PROX_SHOPPING_MALL + PROX_BUS_STOP +
NO_Of_UNITS + FAMILY_FRIENDLY + FREEHOLD, data = condo_resale.sf)
weights: nb_lw

Moran I statistic standard deviate = 24.366, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Observed Moran I      Expectation         Variance 
    1.438876e-01    -5.487594e-03     3.758259e-05 

The Global Moran’s I test for residual spatial autocorrelation indicates a p-value of less than 0.00000000000000022, which is significantly smaller than the alpha value of 0.05. Therefore, we reject the null hypothesis that the residuals are randomly distributed.

Additionally, since the observed Global Moran’s I value is 0.1424418, which is greater than 0, we can infer that the residuals exhibit a clustered distribution pattern.

9. BUILDING HEDONIC PRICING MODELS USING GWmodel

In this section, we are going to model hedonic pricing using both fixed and adaptive bandwidth schemes.

9.1 Building Fixed Bandwidth GWR Model

COMPUTING FIXED BANDWIDTH

In the code chunk below, the bw.gwr() function from the GWModel package is used to determine the optimal fixed bandwidth for the model. Note that the argument adaptive is set to FALSE, indicating that we are calculating a fixed bandwidth.

There are two possible approaches to determine the stopping rule: the cross-validation (CV) approach and the AIC corrected (AICc) approach. In this case, we define the stopping rule based on the agreement between these approaches.

bw.fixed <- bw.gwr(formula = SELLING_PRICE ~ AREA_SQM + AGE + PROX_CBD +                       PROX_CHILDCARE + PROX_ELDERLYCARE  + PROX_URA_GROWTH_AREA +                       PROX_MRT   + PROX_PARK + PROX_PRIMARY_SCH +                      
                     PROX_SHOPPING_MALL + PROX_BUS_STOP + NO_Of_UNITS +                       FAMILY_FRIENDLY + FREEHOLD, data=condo_resale.sp, approach="CV",                     kernel="gaussian", adaptive=FALSE, longlat=FALSE)
Fixed bandwidth: 17660.96 CV score: 8.259118e+14 
Fixed bandwidth: 10917.26 CV score: 7.970454e+14 
Fixed bandwidth: 6749.419 CV score: 7.273273e+14 
Fixed bandwidth: 4173.553 CV score: 6.300006e+14 
Fixed bandwidth: 2581.58 CV score: 5.404958e+14 
Fixed bandwidth: 1597.687 CV score: 4.857515e+14 
Fixed bandwidth: 989.6077 CV score: 4.722431e+14 
Fixed bandwidth: 613.7939 CV score: 1.378294e+16 
Fixed bandwidth: 1221.873 CV score: 4.778717e+14 
Fixed bandwidth: 846.0596 CV score: 4.791629e+14 
Fixed bandwidth: 1078.325 CV score: 4.751406e+14 
Fixed bandwidth: 934.7772 CV score: 4.72518e+14 
Fixed bandwidth: 1023.495 CV score: 4.730305e+14 
Fixed bandwidth: 968.6643 CV score: 4.721317e+14 
Fixed bandwidth: 955.7206 CV score: 4.722072e+14 
Fixed bandwidth: 976.6639 CV score: 4.721387e+14 
Fixed bandwidth: 963.7202 CV score: 4.721484e+14 
Fixed bandwidth: 971.7199 CV score: 4.721293e+14 
Fixed bandwidth: 973.6083 CV score: 4.721309e+14 
Fixed bandwidth: 970.5527 CV score: 4.721295e+14 
Fixed bandwidth: 972.4412 CV score: 4.721296e+14 
Fixed bandwidth: 971.2741 CV score: 4.721292e+14 
Fixed bandwidth: 970.9985 CV score: 4.721293e+14 
Fixed bandwidth: 971.4443 CV score: 4.721292e+14 
Fixed bandwidth: 971.5496 CV score: 4.721293e+14 
Fixed bandwidth: 971.3793 CV score: 4.721292e+14 
Fixed bandwidth: 971.3391 CV score: 4.721292e+14 
Fixed bandwidth: 971.3143 CV score: 4.721292e+14 
Fixed bandwidth: 971.3545 CV score: 4.721292e+14 
Fixed bandwidth: 971.3296 CV score: 4.721292e+14 
Fixed bandwidth: 971.345 CV score: 4.721292e+14 
Fixed bandwidth: 971.3355 CV score: 4.721292e+14 
Fixed bandwidth: 971.3413 CV score: 4.721292e+14 
Fixed bandwidth: 971.3377 CV score: 4.721292e+14 
Fixed bandwidth: 971.34 CV score: 4.721292e+14 
Fixed bandwidth: 971.3405 CV score: 4.721292e+14 
Fixed bandwidth: 971.3408 CV score: 4.721292e+14 
Fixed bandwidth: 971.3403 CV score: 4.721292e+14 
Fixed bandwidth: 971.3406 CV score: 4.721292e+14 
Fixed bandwidth: 971.3404 CV score: 4.721292e+14 
Fixed bandwidth: 971.3405 CV score: 4.721292e+14 
Fixed bandwidth: 971.3405 CV score: 4.721292e+14 

The result shows that the recommended bandwidth is 971.3405 meters. The result is in meters because the coordinate reference system (CRS) used in the model is likely a projected coordinate system, such as SVY21 (EPSG: 3414) or UTM, which measures distances in meters. Unlike geographic coordinate systems like WGS84, which use degrees to represent latitude and longitude, projected systems are designed for spatial analysis and use linear units like meters to provide accurate distance measurements over a specific area.

GWModel METHOD - FIXED BANDWIDTH

Now we can use the code chunk below to calibrate the gwr model using the fixed bandwidth and gaussian kernel.

gwr.fixed <- gwr.basic(formula = SELLING_PRICE ~ AREA_SQM + AGE + PROX_CBD +                           PROX_CHILDCARE + PROX_ELDERLYCARE  + PROX_URA_GROWTH_AREA +                           PROX_MRT   + PROX_PARK + PROX_PRIMARY_SCH +                           PROX_SHOPPING_MALL + PROX_BUS_STOP + NO_Of_UNITS +                           FAMILY_FRIENDLY + FREEHOLD,                         
                       data=condo_resale.sp,                       
                       bw=bw.fixed,                     
                       kernel = 'gaussian',                    
                       longlat = FALSE)

The output is saved as a list of class “gwrm”. The code below can be used to display the model output.

gwr.fixed
   ***********************************************************************
   *                       Package   GWmodel                             *
   ***********************************************************************
   Program starts at: 2024-10-15 20:04:54.874292 
   Call:
   gwr.basic(formula = SELLING_PRICE ~ AREA_SQM + AGE + PROX_CBD + 
    PROX_CHILDCARE + PROX_ELDERLYCARE + PROX_URA_GROWTH_AREA + 
    PROX_MRT + PROX_PARK + PROX_PRIMARY_SCH + PROX_SHOPPING_MALL + 
    PROX_BUS_STOP + NO_Of_UNITS + FAMILY_FRIENDLY + FREEHOLD, 
    data = condo_resale.sp, bw = bw.fixed, kernel = "gaussian", 
    longlat = FALSE)

   Dependent (y) variable:  SELLING_PRICE
   Independent variables:  AREA_SQM AGE PROX_CBD PROX_CHILDCARE PROX_ELDERLYCARE PROX_URA_GROWTH_AREA PROX_MRT PROX_PARK PROX_PRIMARY_SCH PROX_SHOPPING_MALL PROX_BUS_STOP NO_Of_UNITS FAMILY_FRIENDLY FREEHOLD
   Number of data points: 1436
   ***********************************************************************
   *                    Results of Global Regression                     *
   ***********************************************************************

   Call:
    lm(formula = formula, data = data)

   Residuals:
     Min       1Q   Median       3Q      Max 
-3470778  -298119   -23481   248917 12234210 

   Coefficients:
                          Estimate Std. Error t value Pr(>|t|)    
   (Intercept)           527633.22  108183.22   4.877 1.20e-06 ***
   AREA_SQM               12777.52     367.48  34.771  < 2e-16 ***
   AGE                   -24687.74    2754.84  -8.962  < 2e-16 ***
   PROX_CBD              -77131.32    5763.12 -13.384  < 2e-16 ***
   PROX_CHILDCARE       -318472.75  107959.51  -2.950 0.003231 ** 
   PROX_ELDERLYCARE      185575.62   39901.86   4.651 3.61e-06 ***
   PROX_URA_GROWTH_AREA   39163.25   11754.83   3.332 0.000885 ***
   PROX_MRT             -294745.11   56916.37  -5.179 2.56e-07 ***
   PROX_PARK             570504.81   65507.03   8.709  < 2e-16 ***
   PROX_PRIMARY_SCH      159856.14   60234.60   2.654 0.008046 ** 
   PROX_SHOPPING_MALL   -220947.25   36561.83  -6.043 1.93e-09 ***
   PROX_BUS_STOP         682482.22  134513.24   5.074 4.42e-07 ***
   NO_Of_UNITS             -245.48      87.95  -2.791 0.005321 ** 
   FAMILY_FRIENDLY       146307.58   46893.02   3.120 0.001845 ** 
   FREEHOLD              350599.81   48506.48   7.228 7.98e-13 ***

   ---Significance stars
   Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
   Residual standard error: 756000 on 1421 degrees of freedom
   Multiple R-squared: 0.6507
   Adjusted R-squared: 0.6472 
   F-statistic: 189.1 on 14 and 1421 DF,  p-value: < 2.2e-16 
   ***Extra Diagnostic information
   Residual sum of squares: 8.120609e+14
   Sigma(hat): 752522.9
   AIC:  42966.76
   AICc:  42967.14
   BIC:  41731.39
   ***********************************************************************
   *          Results of Geographically Weighted Regression              *
   ***********************************************************************

   *********************Model calibration information*********************
   Kernel function: gaussian 
   Fixed bandwidth: 971.3405 
   Regression points: the same locations as observations are used.
   Distance metric: Euclidean distance metric is used.

   ****************Summary of GWR coefficient estimates:******************
                               Min.     1st Qu.      Median     3rd Qu.
   Intercept            -3.5988e+07 -5.1998e+05  7.6780e+05  1.7412e+06
   AREA_SQM              1.0003e+03  5.2758e+03  7.4740e+03  1.2301e+04
   AGE                  -1.3475e+05 -2.0813e+04 -8.6260e+03 -3.7784e+03
   PROX_CBD             -7.7047e+07 -2.3608e+05 -8.3600e+04  3.4646e+04
   PROX_CHILDCARE       -6.0097e+06 -3.3667e+05 -9.7425e+04  2.9007e+05
   PROX_ELDERLYCARE     -3.5000e+06 -1.5970e+05  3.1971e+04  1.9577e+05
   PROX_URA_GROWTH_AREA -3.0170e+06 -8.2013e+04  7.0749e+04  2.2612e+05
   PROX_MRT             -3.5282e+06 -6.5836e+05 -1.8833e+05  3.6922e+04
   PROX_PARK            -1.2062e+06 -2.1732e+05  3.5383e+04  4.1335e+05
   PROX_PRIMARY_SCH     -2.2695e+07 -1.7066e+05  4.8472e+04  5.1555e+05
   PROX_SHOPPING_MALL   -7.2585e+06 -1.6684e+05 -1.0517e+04  1.5923e+05
   PROX_BUS_STOP        -1.4676e+06 -4.5207e+04  3.7601e+05  1.1664e+06
   NO_Of_UNITS          -1.3170e+03 -2.4822e+02 -3.0846e+01  2.5496e+02
   FAMILY_FRIENDLY      -2.2749e+06 -1.1140e+05  7.6214e+03  1.6107e+05
   FREEHOLD             -9.2067e+06  3.8073e+04  1.5169e+05  3.7528e+05
                             Max.
   Intercept            112793548
   AREA_SQM                 21575
   AGE                     434201
   PROX_CBD               2704596
   PROX_CHILDCARE         1654087
   PROX_ELDERLYCARE      38867814
   PROX_URA_GROWTH_AREA  78515730
   PROX_MRT               3124316
   PROX_PARK             18122425
   PROX_PRIMARY_SCH       4637503
   PROX_SHOPPING_MALL     1529952
   PROX_BUS_STOP         11342182
   NO_Of_UNITS              12907
   FAMILY_FRIENDLY        1720744
   FREEHOLD               6073636
   ************************Diagnostic information*************************
   Number of data points: 1436 
   Effective number of parameters (2trace(S) - trace(S'S)): 438.3804 
   Effective degrees of freedom (n-2trace(S) + trace(S'S)): 997.6196 
   AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 42263.61 
   AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 41632.36 
   BIC (GWR book, Fotheringham, et al. 2002,GWR p. 61, eq. 2.34): 42515.71 
   Residual sum of squares: 2.53407e+14 
   R-square value:  0.8909912 
   Adjusted R-square value:  0.8430417 

   ***********************************************************************
   Program stops at: 2024-10-15 20:04:56.310744 

The report reveals that the AICc (Akaike Information Criterion corrected) of the geographically weighted regression (GWR) model is 42263.61, which is significantly smaller than the AICc of the global multiple linear regression model, which is 42967.1. This suggests that the GWR model provides a better fit for the data compared to the global model.

9.2 Building Adaptive Bandwidth GWR Model

In this section, we will calibrate the gwr-based hedonic pricing model by using the adaptive bandwidth approach.

COMPUTING THE ADAPTIVE BANDWIDTH

As in the earlier section, we will first use bw.gwr() to determine the optimal number of data points to use for the adaptive bandwidth. The code used for this process is very similar to the one used for calculating the fixed bandwidth, except that the adaptive argument is now set to TRUE. This change ensures that the bandwidth adapts to the density of data points, allowing for more flexibility in areas with varying data density.

bw.adaptive <- bw.gwr(formula = SELLING_PRICE ~ AREA_SQM + AGE  +                          PROX_CBD + PROX_CHILDCARE + PROX_ELDERLYCARE    +                PROX_URA_GROWTH_AREA + PROX_MRT + PROX_PARK +                          PROX_PRIMARY_SCH + PROX_SHOPPING_MALL   + PROX_BUS_STOP +                          NO_Of_UNITS + FAMILY_FRIENDLY + FREEHOLD,                       
                      data=condo_resale.sp,                      
                      approach="CV",                        
                      kernel="gaussian",                     
                      adaptive=TRUE,                       
                      longlat=FALSE)
Adaptive bandwidth: 895 CV score: 7.952401e+14 
Adaptive bandwidth: 561 CV score: 7.667364e+14 
Adaptive bandwidth: 354 CV score: 6.953454e+14 
Adaptive bandwidth: 226 CV score: 6.15223e+14 
Adaptive bandwidth: 147 CV score: 5.674373e+14 
Adaptive bandwidth: 98 CV score: 5.426745e+14 
Adaptive bandwidth: 68 CV score: 5.168117e+14 
Adaptive bandwidth: 49 CV score: 4.859631e+14 
Adaptive bandwidth: 37 CV score: 4.646518e+14 
Adaptive bandwidth: 30 CV score: 4.422088e+14 
Adaptive bandwidth: 25 CV score: 4.430816e+14 
Adaptive bandwidth: 32 CV score: 4.505602e+14 
Adaptive bandwidth: 27 CV score: 4.462172e+14 
Adaptive bandwidth: 30 CV score: 4.422088e+14 

The result shows that the 30 is the recommended data points to be used.

CONSTRUCTING THE ADAPTIVE BANDWIDTH GWRModel

Now, we can go ahead to calibrate the gwr-based hedonic pricing model by using adaptive bandwidth and gaussian kernel.

gwr.adaptive <- gwr.basic(formula = SELLING_PRICE ~ AREA_SQM + AGE +                              PROX_CBD + PROX_CHILDCARE + PROX_ELDERLYCARE +                              PROX_URA_GROWTH_AREA + PROX_MRT + PROX_PARK +                              PROX_PRIMARY_SCH + PROX_SHOPPING_MALL + PROX_BUS_STOP +                              NO_Of_UNITS + FAMILY_FRIENDLY + FREEHOLD,                            
                          data=condo_resale.sp, 
                          bw=bw.adaptive,                            
                          kernel = 'gaussian',                
                          adaptive=TRUE,                        
                          longlat = FALSE)

The code below displays the model output.

gwr.adaptive
   ***********************************************************************
   *                       Package   GWmodel                             *
   ***********************************************************************
   Program starts at: 2024-10-15 20:05:07.200518 
   Call:
   gwr.basic(formula = SELLING_PRICE ~ AREA_SQM + AGE + PROX_CBD + 
    PROX_CHILDCARE + PROX_ELDERLYCARE + PROX_URA_GROWTH_AREA + 
    PROX_MRT + PROX_PARK + PROX_PRIMARY_SCH + PROX_SHOPPING_MALL + 
    PROX_BUS_STOP + NO_Of_UNITS + FAMILY_FRIENDLY + FREEHOLD, 
    data = condo_resale.sp, bw = bw.adaptive, kernel = "gaussian", 
    adaptive = TRUE, longlat = FALSE)

   Dependent (y) variable:  SELLING_PRICE
   Independent variables:  AREA_SQM AGE PROX_CBD PROX_CHILDCARE PROX_ELDERLYCARE PROX_URA_GROWTH_AREA PROX_MRT PROX_PARK PROX_PRIMARY_SCH PROX_SHOPPING_MALL PROX_BUS_STOP NO_Of_UNITS FAMILY_FRIENDLY FREEHOLD
   Number of data points: 1436
   ***********************************************************************
   *                    Results of Global Regression                     *
   ***********************************************************************

   Call:
    lm(formula = formula, data = data)

   Residuals:
     Min       1Q   Median       3Q      Max 
-3470778  -298119   -23481   248917 12234210 

   Coefficients:
                          Estimate Std. Error t value Pr(>|t|)    
   (Intercept)           527633.22  108183.22   4.877 1.20e-06 ***
   AREA_SQM               12777.52     367.48  34.771  < 2e-16 ***
   AGE                   -24687.74    2754.84  -8.962  < 2e-16 ***
   PROX_CBD              -77131.32    5763.12 -13.384  < 2e-16 ***
   PROX_CHILDCARE       -318472.75  107959.51  -2.950 0.003231 ** 
   PROX_ELDERLYCARE      185575.62   39901.86   4.651 3.61e-06 ***
   PROX_URA_GROWTH_AREA   39163.25   11754.83   3.332 0.000885 ***
   PROX_MRT             -294745.11   56916.37  -5.179 2.56e-07 ***
   PROX_PARK             570504.81   65507.03   8.709  < 2e-16 ***
   PROX_PRIMARY_SCH      159856.14   60234.60   2.654 0.008046 ** 
   PROX_SHOPPING_MALL   -220947.25   36561.83  -6.043 1.93e-09 ***
   PROX_BUS_STOP         682482.22  134513.24   5.074 4.42e-07 ***
   NO_Of_UNITS             -245.48      87.95  -2.791 0.005321 ** 
   FAMILY_FRIENDLY       146307.58   46893.02   3.120 0.001845 ** 
   FREEHOLD              350599.81   48506.48   7.228 7.98e-13 ***

   ---Significance stars
   Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
   Residual standard error: 756000 on 1421 degrees of freedom
   Multiple R-squared: 0.6507
   Adjusted R-squared: 0.6472 
   F-statistic: 189.1 on 14 and 1421 DF,  p-value: < 2.2e-16 
   ***Extra Diagnostic information
   Residual sum of squares: 8.120609e+14
   Sigma(hat): 752522.9
   AIC:  42966.76
   AICc:  42967.14
   BIC:  41731.39
   ***********************************************************************
   *          Results of Geographically Weighted Regression              *
   ***********************************************************************

   *********************Model calibration information*********************
   Kernel function: gaussian 
   Adaptive bandwidth: 30 (number of nearest neighbours)
   Regression points: the same locations as observations are used.
   Distance metric: Euclidean distance metric is used.

   ****************Summary of GWR coefficient estimates:******************
                               Min.     1st Qu.      Median     3rd Qu.
   Intercept            -1.3487e+08 -2.4669e+05  7.7928e+05  1.6194e+06
   AREA_SQM              3.3188e+03  5.6285e+03  7.7825e+03  1.2738e+04
   AGE                  -9.6746e+04 -2.9288e+04 -1.4043e+04 -5.6119e+03
   PROX_CBD             -2.5330e+06 -1.6256e+05 -7.7242e+04  2.6624e+03
   PROX_CHILDCARE       -1.2790e+06 -2.0175e+05  8.7158e+03  3.7778e+05
   PROX_ELDERLYCARE     -1.6212e+06 -9.2050e+04  6.1029e+04  2.8184e+05
   PROX_URA_GROWTH_AREA -7.2686e+06 -3.0350e+04  4.5869e+04  2.4613e+05
   PROX_MRT             -4.3781e+07 -6.7282e+05 -2.2115e+05 -7.4593e+04
   PROX_PARK            -2.9020e+06 -1.6782e+05  1.1601e+05  4.6572e+05
   PROX_PRIMARY_SCH     -8.6418e+05 -1.6627e+05 -7.7853e+03  4.3222e+05
   PROX_SHOPPING_MALL   -1.8272e+06 -1.3175e+05 -1.4049e+04  1.3799e+05
   PROX_BUS_STOP        -2.0579e+06 -7.1461e+04  4.1104e+05  1.2071e+06
   NO_Of_UNITS          -2.1993e+03 -2.3685e+02 -3.4699e+01  1.1657e+02
   FAMILY_FRIENDLY      -5.9879e+05 -5.0927e+04  2.6173e+04  2.2481e+05
   FREEHOLD             -1.6340e+05  4.0765e+04  1.9023e+05  3.7960e+05
                            Max.
   Intercept            18758355
   AREA_SQM                23064
   AGE                     13303
   PROX_CBD             11346650
   PROX_CHILDCARE        2892127
   PROX_ELDERLYCARE      2465671
   PROX_URA_GROWTH_AREA  7384059
   PROX_MRT              1186242
   PROX_PARK             2588497
   PROX_PRIMARY_SCH      3381462
   PROX_SHOPPING_MALL   38038564
   PROX_BUS_STOP        12081592
   NO_Of_UNITS              1010
   FAMILY_FRIENDLY       2072414
   FREEHOLD              1813995
   ************************Diagnostic information*************************
   Number of data points: 1436 
   Effective number of parameters (2trace(S) - trace(S'S)): 350.3088 
   Effective degrees of freedom (n-2trace(S) + trace(S'S)): 1085.691 
   AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 41982.22 
   AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 41546.74 
   BIC (GWR book, Fotheringham, et al. 2002,GWR p. 61, eq. 2.34): 41914.08 
   Residual sum of squares: 2.528227e+14 
   R-square value:  0.8912425 
   Adjusted R-square value:  0.8561185 

   ***********************************************************************
   Program stops at: 2024-10-15 20:05:08.832204 

The report shows that the AICc adaptive distance-gwr is 41982.22, which is smaller than the AICc fixed distance-gwr of 42263.61.

9.3 Visualizing GWR Output

In addition to the regression residuals, the output feature class table includes fields for observed and predicted y values, condition number (cond), Local R², residuals, explanatory variable coefficients, and their standard errors:

  • Condition Number: This diagnostic evaluates local collinearity. Strong local collinearity can make results unstable. Results with condition numbers greater than 30 may be unreliable.

  • Local R²: These values range from 0.0 to 1.0 and show how well the local regression model fits the observed y values. Low values indicate poor model performance in that area. Mapping the Local R² values can highlight where the GWR model predicts well and where it struggles, potentially identifying missing variables in the model.

  • Predicted: These are the fitted y values estimated by GWR.

  • Residuals: Residuals are calculated by subtracting the fitted y values from the observed y values. Standardized residuals have a mean of zero and a standard deviation of 1. A cold-to-hot color map of standardized residuals can be created to visualize these values.

  • Coefficient Standard Error: This measures the reliability of each coefficient estimate. Smaller standard errors relative to the coefficient values indicate higher confidence in the estimates. Large standard errors may signal local collinearity issues.

All these values are stored in a SpatialPointsDataFrame or SpatialPolygonsDataFrame object, which is integrated with fit points, GWR coefficient estimates, observed and predicted y values, coefficient standard errors, and t-values. These data are stored in the “data” slot of an object called SDF in the output list.

9.4 Converting SDF into sf data.frame

To visualize the fields in SDF, let’s first convert it into an sf data.frame.

condo_resale.sf.adaptive <- st_as_sf(gwr.adaptive$SDF) %>%   st_transform(crs=3414)
condo_resale.sf.adaptive.svy21 <- st_transform(condo_resale.sf.adaptive, 3414)
condo_resale.sf.adaptive.svy21  
Simple feature collection with 1436 features and 51 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 14940.85 ymin: 24765.67 xmax: 43352.45 ymax: 48382.81
Projected CRS: SVY21 / Singapore TM
First 10 features:
    Intercept  AREA_SQM        AGE  PROX_CBD PROX_CHILDCARE PROX_ELDERLYCARE
1   2050011.7  9561.892  -9514.634 -120681.9      319266.92       -393417.79
2   1633128.2 16576.853 -58185.479 -149434.2      441102.18        325188.74
3   3433608.2 13091.861 -26707.386 -259397.8     -120116.82        535855.81
4    234358.9 20730.601 -93308.988 2426853.7      480825.28        314783.72
5   2285804.9  6722.836 -17608.018 -316835.5       90764.78       -137384.61
6  -3568877.4  6039.581 -26535.592  327306.1     -152531.19       -700392.85
7  -2874842.4 16843.575 -59166.727 -983577.2     -177810.50       -122384.02
8   2038086.0  6905.135 -17681.897 -285076.6       70259.40        -96012.78
9   1718478.4  9580.703 -14401.128  105803.4     -657698.02       -123276.00
10  3457054.0 14072.011 -31579.884 -234895.4       79961.45        548581.04
   PROX_URA_GROWTH_AREA    PROX_MRT  PROX_PARK PROX_PRIMARY_SCH
1            -159980.20  -299742.96 -172104.47        242668.03
2            -142290.39 -2510522.23  523379.72       1106830.66
3            -253621.21  -936853.28  209099.85        571462.33
4           -2679297.89 -2039479.50 -759153.26       3127477.21
5             303714.81   -44567.05  -10284.62         30413.56
6             -28051.25   733566.47 1511488.92        320878.23
7            1397676.38 -2745430.34  710114.74       1786570.95
8             269368.71   -14552.99   73533.34         53359.73
9            -361974.72  -476785.32 -132067.59        -40128.92
10           -150024.38 -1503835.53  574155.47        108996.67
   PROX_SHOPPING_MALL PROX_BUS_STOP  NO_Of_UNITS FAMILY_FRIENDLY  FREEHOLD
1          300881.390     1210615.4  104.8290640       -9075.370  303955.6
2          -87693.378     1843587.2 -288.3441183      310074.664  396221.3
3         -126732.712     1411924.9   -9.5532945        5949.746  168821.7
4          -29593.342     7225577.5 -161.3551620     1556178.531 1212515.6
5           -7490.586      677577.0   42.2659674       58986.951  328175.2
6          258583.881     1086012.6 -214.3671271      201992.641  471873.1
7         -384251.210     5094060.5   -0.9212521      359659.512  408871.9
8          -39634.902      735767.1   30.1741069       55602.506  347075.0
9          276718.757     2815772.4  675.1615559      -30453.297  503872.8
10        -454726.822     2123557.0  -21.3044311     -100935.586  213324.6
         y    yhat    residual CV_Score Stud_residual Intercept_SE AREA_SQM_SE
1  3000000 2886532   113468.16        0    0.38207013     516105.5    823.2860
2  3880000 3466801   413198.52        0    1.01433140     488083.5    825.2380
3  3325000 3616527  -291527.20        0   -0.83780678     963711.4    988.2240
4  4250000 5435482 -1185481.63        0   -2.84614670     444185.5    617.4007
5  1400000 1388166    11834.26        0    0.03404453    2119620.6   1376.2778
6  1320000 1516702  -196701.94        0   -0.72065800   28572883.7   2348.0091
7  3410000 3266881   143118.77        0    0.41291992     679546.6    893.5893
8  1420000 1431955   -11955.27        0   -0.03033109    2217773.1   1415.2604
9  2025000 1832799   192200.83        0    0.52018109     814281.8    943.8434
10 2550000 2223364   326635.53        0    1.10559735    2410252.0   1271.4073
      AGE_SE PROX_CBD_SE PROX_CHILDCARE_SE PROX_ELDERLYCARE_SE
1   5889.782    37411.22          319111.1           120633.34
2   6226.916    23615.06          299705.3            84546.69
3   6510.236    56103.77          349128.5           129687.07
4   6010.511   469337.41          304965.2           127150.69
5   8180.361   410644.47          698720.6           327371.55
6  14601.909  5272846.47         1141599.8          1653002.19
7   8970.629   346164.20          530101.1           148598.71
8   8661.309   438035.69          742532.8           399221.05
9  11791.208    89148.35          704630.7           329683.30
10  9941.980   173532.77          500976.2           281876.74
   PROX_URA_GROWTH_AREA_SE PROX_MRT_SE PROX_PARK_SE PROX_PRIMARY_SCH_SE
1                 56207.39    185181.3     205499.6            152400.7
2                 76956.50    281133.9     229358.7            165150.7
3                 95774.60    275483.7     314124.3            196662.6
4                470762.12    279877.1     227249.4            240878.9
5                474339.56    363830.0     364580.9            249087.7
6               5496627.21    730453.2    1741712.0            683265.5
7                371692.97    375511.9     297400.9            344602.8
8                517977.91    423155.4     440984.4            261251.2
9                153436.22    285325.4     304998.4            278258.5
10               239182.57    571355.7     599131.8            331284.8
   PROX_SHOPPING_MALL_SE PROX_BUS_STOP_SE NO_Of_UNITS_SE FAMILY_FRIENDLY_SE
1               109268.8         600668.6       218.1258           131474.7
2                98906.8         410222.1       208.9410           114989.1
3               119913.3         464156.7       210.9828           146607.2
4               177104.1         562810.8       361.7767           108726.6
5               301032.9         740922.4       299.5034           160663.7
6              2931208.6        1418333.3       602.5571           331727.0
7               249969.5         821236.4       532.1978           129241.2
8               351634.0         775038.4       338.6777           171895.1
9               289872.7         850095.5       439.9037           220223.4
10              265529.7         631399.2       259.0169           189125.5
   FREEHOLD_SE Intercept_TV AREA_SQM_TV     AGE_TV PROX_CBD_TV
1     115954.0    3.9720784   11.614302  -1.615447 -3.22582173
2     130110.0    3.3460017   20.087361  -9.344188 -6.32792021
3     141031.5    3.5629010   13.247868  -4.102368 -4.62353528
4     138239.1    0.5276150   33.577223 -15.524302  5.17080808
5     210641.1    1.0784029    4.884795  -2.152474 -0.77155660
6     374347.3   -0.1249043    2.572214  -1.817269  0.06207388
7     182216.9   -4.2305303   18.849348  -6.595605 -2.84136028
8     216649.4    0.9189786    4.879056  -2.041481 -0.65080678
9     220473.7    2.1104224   10.150733  -1.221345  1.18682383
10    206346.2    1.4343123   11.068059  -3.176418 -1.35360852
   PROX_CHILDCARE_TV PROX_ELDERLYCARE_TV PROX_URA_GROWTH_AREA_TV PROX_MRT_TV
1         1.00048819          -3.2612693            -2.846248368 -1.61864578
2         1.47178634           3.8462625            -1.848971738 -8.92998600
3        -0.34404755           4.1319138            -2.648105057 -3.40075727
4         1.57665606           2.4756745            -5.691404992 -7.28705261
5         0.12990138          -0.4196596             0.640289855 -0.12249416
6        -0.13361179          -0.4237096            -0.005103357  1.00426206
7        -0.33542751          -0.8235874             3.760298131 -7.31116712
8         0.09462126          -0.2405003             0.520038994 -0.03439159
9        -0.93339393          -0.3739225            -2.359121712 -1.67102293
10        0.15961128           1.9461735            -0.627237944 -2.63204802
   PROX_PARK_TV PROX_PRIMARY_SCH_TV PROX_SHOPPING_MALL_TV PROX_BUS_STOP_TV
1   -0.83749312           1.5923022            2.75358842        2.0154464
2    2.28192684           6.7019454           -0.88662640        4.4941192
3    0.66565951           2.9058009           -1.05686949        3.0419145
4   -3.34061770          12.9836105           -0.16709578       12.8383775
5   -0.02820944           0.1220998           -0.02488294        0.9145046
6    0.86781794           0.4696245            0.08821750        0.7656963
7    2.38773567           5.1844351           -1.53719231        6.2029165
8    0.16674816           0.2042469           -0.11271635        0.9493299
9   -0.43301073          -0.1442145            0.95462153        3.3123012
10   0.95831249           0.3290120           -1.71252687        3.3632555
   NO_Of_UNITS_TV FAMILY_FRIENDLY_TV FREEHOLD_TV  Local_R2
1     0.480589953        -0.06902748    2.621347 0.8846744
2    -1.380026395         2.69655779    3.045280 0.8899773
3    -0.045279967         0.04058290    1.197050 0.8947007
4    -0.446007570        14.31276425    8.771149 0.9073605
5     0.141120178         0.36714544    1.557983 0.9510057
6    -0.355762335         0.60891234    1.260522 0.9247586
7    -0.001731033         2.78285441    2.243875 0.8310458
8     0.089093858         0.32346758    1.602012 0.9463936
9     1.534793921        -0.13828365    2.285410 0.8380365
10   -0.082251138        -0.53369623    1.033819 0.9080753
                    geometry
1  POINT (22085.12 29951.54)
2   POINT (25656.84 34546.2)
3   POINT (23963.99 32890.8)
4  POINT (27044.28 32319.77)
5  POINT (41042.56 33743.64)
6   POINT (39717.04 32943.1)
7   POINT (28419.1 33513.37)
8  POINT (40763.57 33879.61)
9  POINT (23595.63 28884.78)
10 POINT (24586.56 33194.31)
gwr.adaptive.output <- as.data.frame(gwr.adaptive$SDF) 
condo_resale.sf.adaptive <- cbind(condo_resale.res.sf, as.matrix(gwr.adaptive.output))

Next, let’s use glimpse() to display the content of the condo_resale.sf.adaptive sf data frame.

glimpse(condo_resale.sf.adaptive)
Rows: 1,436
Columns: 77
$ POSTCODE                <dbl> 118635, 288420, 267833, 258380, 467169, 466472…
$ SELLING_PRICE           <dbl> 3000000, 3880000, 3325000, 4250000, 1400000, 1…
$ AREA_SQM                <dbl> 309, 290, 248, 127, 145, 139, 218, 141, 165, 1…
$ AGE                     <dbl> 30, 32, 33, 7, 28, 22, 24, 24, 27, 31, 17, 22,…
$ PROX_CBD                <dbl> 7.941259, 6.609797, 6.898000, 4.038861, 11.783…
$ PROX_CHILDCARE          <dbl> 0.16597932, 0.28027246, 0.42922669, 0.39473543…
$ PROX_ELDERLYCARE        <dbl> 2.5198118, 1.9333338, 0.5021395, 1.9910316, 1.…
$ PROX_URA_GROWTH_AREA    <dbl> 6.618741, 7.505109, 6.463887, 4.906512, 6.4106…
$ PROX_HAWKER_MARKET      <dbl> 1.76542207, 0.54507614, 0.37789301, 1.68259969…
$ PROX_KINDERGARTEN       <dbl> 0.05835552, 0.61592412, 0.14120309, 0.38200076…
$ PROX_MRT                <dbl> 0.5607188, 0.6584461, 0.3053433, 0.6910183, 0.…
$ PROX_PARK               <dbl> 1.1710446, 0.1992269, 0.2779886, 0.9832843, 0.…
$ PROX_PRIMARY_SCH        <dbl> 1.6340256, 0.9747834, 1.4715016, 1.4546324, 0.…
$ PROX_TOP_PRIMARY_SCH    <dbl> 3.3273195, 0.9747834, 1.4715016, 2.3006394, 0.…
$ PROX_SHOPPING_MALL      <dbl> 2.2102717, 2.9374279, 1.2256850, 0.3525671, 1.…
$ PROX_SUPERMARKET        <dbl> 0.9103958, 0.5900617, 0.4135583, 0.4162219, 0.…
$ PROX_BUS_STOP           <dbl> 0.10336166, 0.28673408, 0.28504777, 0.29872340…
$ NO_Of_UNITS             <dbl> 18, 20, 27, 30, 30, 31, 32, 32, 32, 32, 34, 34…
$ FAMILY_FRIENDLY         <dbl> 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0…
$ FREEHOLD                <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1…
$ LEASEHOLD_99YR          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ LOG_SELLING_PRICE       <dbl> 14.91412, 15.17135, 15.01698, 15.26243, 14.151…
$ MLR_RES                 <dbl> -1489099.55, 415494.57, 194129.69, 1088992.71,…
$ Intercept               <dbl> 2050011.67, 1633128.24, 3433608.17, 234358.91,…
$ AREA_SQM.1              <dbl> 9561.892, 16576.853, 13091.861, 20730.601, 672…
$ AGE.1                   <dbl> -9514.634, -58185.479, -26707.386, -93308.988,…
$ PROX_CBD.1              <dbl> -120681.94, -149434.22, -259397.77, 2426853.66…
$ PROX_CHILDCARE.1        <dbl> 319266.925, 441102.177, -120116.816, 480825.28…
$ PROX_ELDERLYCARE.1      <dbl> -393417.795, 325188.741, 535855.806, 314783.72…
$ PROX_URA_GROWTH_AREA.1  <dbl> -159980.203, -142290.389, -253621.206, -267929…
$ PROX_MRT.1              <dbl> -299742.96, -2510522.23, -936853.28, -2039479.…
$ PROX_PARK.1             <dbl> -172104.47, 523379.72, 209099.85, -759153.26, …
$ PROX_PRIMARY_SCH.1      <dbl> 242668.03, 1106830.66, 571462.33, 3127477.21, …
$ PROX_SHOPPING_MALL.1    <dbl> 300881.390, -87693.378, -126732.712, -29593.34…
$ PROX_BUS_STOP.1         <dbl> 1210615.44, 1843587.22, 1411924.90, 7225577.51…
$ NO_Of_UNITS.1           <dbl> 104.8290640, -288.3441183, -9.5532945, -161.35…
$ FAMILY_FRIENDLY.1       <dbl> -9075.370, 310074.664, 5949.746, 1556178.531, …
$ FREEHOLD.1              <dbl> 303955.61, 396221.27, 168821.75, 1212515.58, 3…
$ y                       <dbl> 3000000, 3880000, 3325000, 4250000, 1400000, 1…
$ yhat                    <dbl> 2886531.8, 3466801.5, 3616527.2, 5435481.6, 13…
$ residual                <dbl> 113468.16, 413198.52, -291527.20, -1185481.63,…
$ CV_Score                <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ Stud_residual           <dbl> 0.38207013, 1.01433140, -0.83780678, -2.846146…
$ Intercept_SE            <dbl> 516105.5, 488083.5, 963711.4, 444185.5, 211962…
$ AREA_SQM_SE             <dbl> 823.2860, 825.2380, 988.2240, 617.4007, 1376.2…
$ AGE_SE                  <dbl> 5889.782, 6226.916, 6510.236, 6010.511, 8180.3…
$ PROX_CBD_SE             <dbl> 37411.22, 23615.06, 56103.77, 469337.41, 41064…
$ PROX_CHILDCARE_SE       <dbl> 319111.1, 299705.3, 349128.5, 304965.2, 698720…
$ PROX_ELDERLYCARE_SE     <dbl> 120633.34, 84546.69, 129687.07, 127150.69, 327…
$ PROX_URA_GROWTH_AREA_SE <dbl> 56207.39, 76956.50, 95774.60, 470762.12, 47433…
$ PROX_MRT_SE             <dbl> 185181.3, 281133.9, 275483.7, 279877.1, 363830…
$ PROX_PARK_SE            <dbl> 205499.6, 229358.7, 314124.3, 227249.4, 364580…
$ PROX_PRIMARY_SCH_SE     <dbl> 152400.7, 165150.7, 196662.6, 240878.9, 249087…
$ PROX_SHOPPING_MALL_SE   <dbl> 109268.8, 98906.8, 119913.3, 177104.1, 301032.…
$ PROX_BUS_STOP_SE        <dbl> 600668.6, 410222.1, 464156.7, 562810.8, 740922…
$ NO_Of_UNITS_SE          <dbl> 218.1258, 208.9410, 210.9828, 361.7767, 299.50…
$ FAMILY_FRIENDLY_SE      <dbl> 131474.73, 114989.07, 146607.22, 108726.62, 16…
$ FREEHOLD_SE             <dbl> 115954.0, 130110.0, 141031.5, 138239.1, 210641…
$ Intercept_TV            <dbl> 3.9720784, 3.3460017, 3.5629010, 0.5276150, 1.…
$ AREA_SQM_TV             <dbl> 11.614302, 20.087361, 13.247868, 33.577223, 4.…
$ AGE_TV                  <dbl> -1.6154474, -9.3441881, -4.1023685, -15.524301…
$ PROX_CBD_TV             <dbl> -3.22582173, -6.32792021, -4.62353528, 5.17080…
$ PROX_CHILDCARE_TV       <dbl> 1.000488185, 1.471786337, -0.344047555, 1.5766…
$ PROX_ELDERLYCARE_TV     <dbl> -3.26126929, 3.84626245, 4.13191383, 2.4756745…
$ PROX_URA_GROWTH_AREA_TV <dbl> -2.846248368, -1.848971738, -2.648105057, -5.6…
$ PROX_MRT_TV             <dbl> -1.61864578, -8.92998600, -3.40075727, -7.2870…
$ PROX_PARK_TV            <dbl> -0.83749312, 2.28192684, 0.66565951, -3.340617…
$ PROX_PRIMARY_SCH_TV     <dbl> 1.59230221, 6.70194543, 2.90580089, 12.9836104…
$ PROX_SHOPPING_MALL_TV   <dbl> 2.753588422, -0.886626400, -1.056869486, -0.16…
$ PROX_BUS_STOP_TV        <dbl> 2.0154464, 4.4941192, 3.0419145, 12.8383775, 0…
$ NO_Of_UNITS_TV          <dbl> 0.480589953, -1.380026395, -0.045279967, -0.44…
$ FAMILY_FRIENDLY_TV      <dbl> -0.06902748, 2.69655779, 0.04058290, 14.312764…
$ FREEHOLD_TV             <dbl> 2.6213469, 3.0452799, 1.1970499, 8.7711485, 1.…
$ Local_R2                <dbl> 0.8846744, 0.8899773, 0.8947007, 0.9073605, 0.…
$ coords.x1               <dbl> 22085.12, 25656.84, 23963.99, 27044.28, 41042.…
$ coords.x2               <dbl> 29951.54, 34546.20, 32890.80, 32319.77, 33743.…
$ geometry                <POINT [m]> POINT (22085.12 29951.54), POINT (25656.…
summary(gwr.adaptive$SDF$yhat)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
  171347  1102001  1385528  1751842  1982307 13887901 

9.5 Visualizing Local R2

The code chunks below is used to create an interactive point symbol map.

tmap_mode("view") 
tm_shape(mpsz_svy21)+   
  tm_polygons(alpha = 0.1) + 
  tm_shape(condo_resale.sf.adaptive) +     
  tm_dots(col = "Local_R2",           
          border.col = "gray60",           
          border.lwd = 1) +   
  tm_view(set.zoom.limits = c(11,14))
tmap_mode('plot')

9.6 Visualizing Coefficient Estimates

The code chunks below is used to create an interactive point symbol map.

tmap_mode("view") 
AREA_SQM_SE <- tm_shape(mpsz_svy21)+   
  tm_polygons(alpha = 0.1) + 
  tm_shape(condo_resale.sf.adaptive) +     
  tm_dots(col = "AREA_SQM_SE",           
          border.col = "gray60",           
          border.lwd = 1) +   
  tm_view(set.zoom.limits = c(11,14))  
AREA_SQM_TV <- tm_shape(mpsz_svy21)+   
  tm_polygons(alpha = 0.1) + 
  tm_shape(condo_resale.sf.adaptive) +     
  tm_dots(col = "AREA_SQM_TV",           
          border.col = "gray60",           
          border.lwd = 1) +   
  tm_view(set.zoom.limits = c(11,14))  
tmap_arrange(AREA_SQM_SE, AREA_SQM_TV,               
             asp=1, ncol=2,              
             sync = TRUE)
tmap_mode('plot')

BY URA PLANNING REGION

tm_shape(mpsz_svy21[mpsz_svy21$REGION_N=="CENTRAL REGION", ])+   
  tm_polygons()+ 
  tm_shape(condo_resale.sf.adaptive) +    
  tm_bubbles(col = "Local_R2",            
             size = 0.15,           
             border.col = "gray60",  
             border.lwd = 1)

10. REFERENCES

Gollini I, Lu B, Charlton M, Brunsdon C, Harris P (2015) “GWmodel: an R Package for exploring Spatial Heterogeneity using Geographically Weighted Models”. Journal of Statistical Software, 63(17):1-50, http://www.jstatsoft.org/v63/i17/

Lu B, Harris P, Charlton M, Brunsdon C (2014) “The GWmodel R Package: further topics for exploring Spatial Heterogeneity using GeographicallyWeighted Models”. Geo-spatial Information Science 17(2): 85-101, http://www.tandfonline.com/doi/abs/10.1080/1009502.2014.917453